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1-7. (Cancelled) 

8. (Currently Amended) A system for clustering documents in datasets comprising: 
a storage having a first dataset and a second dataset; 

a cluster generator operative to cluster first documents in. said first dataset and produce 
first document classes; asd 

a centroid seed generator operative to generate centroid seeds based on said first 
document classes [[,]]; 

a dictionary generator adapted to generate a fir st dictionary of most cnmmor. wnrH« i n 
said first dataset: and 

a vector space model generator adapted to g enerate a first vector snare rnnd<»1 hy 
counting, for each word i n said first d ictio n ary, a number of said first documents in which mid 
word occurs. 

wherein said, cluster generator clusters said documents in said fir st dataset based on 
first vector space mode, 

wherein said cluster generator clusters second documents in said second dataset using 
said centroid seeds, such that said second dataset has a similar clustering to that of said first 
dataset, and 

wherein said second dataset comprises a new, but relate d, based on said cenrmid 
dataset different than said first dataset. 

9. (Cancelled). 

10. (Cancelled). 

12. (Original) The system in claim 1 1, further comprising a classifier adapted to classify said 
second documents in said second vector space model using said first document classes to 
produce a classified second vector space model and adapted to determine a mean of vectors in 
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each class in said classified second vector space model, wherein said mean comprises said 
centroid seeds. 



1 3. (Original) The system in claim i 1 , wherein: 

said dictionary generator is adapted to generate a second dictionary of most common 
words in said second dataset, 

said vector space model generator is adapted to generate a third vector space model by 
counting, for each word in said second dictionary, a number of said second documents in which 
said word occurs, and 

said cluster generator is adapted to cluster said second documents in said second dataset 
based on said third vector space model to produce a second dataset cluster. 

14. (Original) The system in claim 13, wherein said cluster generator is adapted to produce an 
adapted dataset cluster by clustering said second documents in said second dataset using said 
centroid seeds and said system further comprises: 

a comparator adapted to compare classes in said adapted dataset cluster to classes in said 
second dataset cluster and add classes to said adapted dataset cluster based on said comparing. 

1 5 . (Currently Amended) A method of clustering documents in a first dataset having first 
documents and a related second dataset having second documents, said method comprising: 

clustering said first documents to produce first document classes; 
generating a vector space model of said second documents; 

classifying said vector space model of said second documents using said first document 
classes to produce a classified vector space model; and 

determining a mean of vectors in each class in said classified vector space model to 
produce centroid seeds; and 

clustering said second documents using said centroid seeds, such that said second dataset 
has a similar clustering to that of said first dataset, 
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wherein said second dataset comprises a new, but related .based on said centroid seeds, 
dataset different than said first dataset> • 

wherein said vector space model comprises „ second vector s pace morie.T ar>H 
clustering of said firs t documents in said first data comp rise*- 

forming a first dictionary of most common words in said first riata^t- ™A 
generating a first vector snace mode ] bv counting fo f ftacn word in said fW 
dictionarv. a number of said first documents in which said word nmw* 

wherejq said clustering of said first documents in said first da^set is based on sairi fW 
vector space model . 

16. (Cancelled). 

1 7. (Original) The method in claim 1 6, wherein said generating of said second vector space 
model comprises counting, for each word in said first dictionary, a number of said second 
documents in which said word occurs. 

1 8. (Original) The method in claim 1 7, further comprising: 

forming a second dictionary of most common words in said second dataset; 

generating a third vector space model by counting, for each word in said second 
dictionary, a number of said second documents in which said word occurs; and 

clustering said documents in said second dataset based on said third vector space model 
to produce a second dataset cluster. 



; m 



19. (Original) The method in clahn 18, wherein said clustering of said second documents i 
said second dataset using said centroid seeds produces an adapted dataset cluster and said method 
further comprises: 

comparing classes in said adapted dataset cluster to classes in said second dataset cluster 



and 
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adding classes to said adapted dataset cluster based on said 



comparing. 



20. (Currently Amended) A method of clustering documents in related datasets comprising: 
forming a first dictionary of most common words in a first dataset; 
generating a first vector space model by counting, for each word in said first dictionary, a 

number of said first documents in which said word occurs; 

clustering said first documents in said first dataset based on said first vector space model 

to produce first document classes; 

generating a second vector space model by counting, for each word in said first 
dictionary, a number of said second documents in which said word occurs; 

classifying said second documents in said second vector space model using said first 
document classes to produce a classified second vector space model; 

deternuning a mean of vectors in each class in said classified second vector space model 
to produce ceatroid seeds; and 

clustering second documents in a second dataset using said centroid seeds, such that said 
second dataset has a similar clustering to that of said first dataset, 

wherein said second dataset comprises a new, but related .based an said centmiri ^ 
dataset different than said first dataset 

2 1 . (Original) The method in claim 20, further comprising: 

forming a second dictionary of most common words in said second dataset; 
generating a third vector space model by counting, for each word, in said second 

dictionary, a number of said second documents in which said word occurs; and 

clustering said documents in said second dataset based on said third vector space model 

to produce a second dataset cluster. 



22. (Original) The method in claim 21. wherein said clustering of said second documents in 
said second dataset using said centroid seeds produces an adapted dataset cluster and said method 
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further comprises: 

comparing classes in said adapted dataset cluster to classes in said second dataset cluster; 

and 

adding classes to said adapted dataset cluster based on said comparing. 

23. (Currently Amended) A program device readable by machine tangibly embodying a 
program of instructions executable by the machine to perform a method of clustering documents 
in datasets comprising: 

clustering first documents in a first dataset to produce first document classes; 
creating centroid seeds based on said first document classes; and 
clustering second documents in a second dataset using said centroid seeds, such that said 
second dataset has a similar clustering to that of said first dataset, 

wherein said second dataset comprises a new, but related .based on said rentroid seeds, 
dataset different than said first dataset 

wherein said clustering of said first do c uments in m jrf First dataset comp rise- 
forming a first dictionary of most co mmon words i n sa id first Hatac^t; 
generating a first vector space model bv ^ „n t in e . f or each WOr d in said first 
dictionary, a number of said first documents i n which saH occurS r and 

clustering said firyt documents in said first dat^t based on saiH first vector sp ac* 

model . 

24. (Canceled). 

25. (Cancelled). 

26. (Previously Presented) A program device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform said method in claim 25, said 
method further comprising generating a second vector space model by counting, for each word in 
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said first dictionary, a number of said second documents in which said word 



occurs. 



27. (Previously Presented) A program device readable by machine , tangibly embodying a 
program of instructions executable by the machine to perform said method in claim 26, wherein 
said creating of said centroid seeds comprises: 

classifying said second vector space model using said first document classes to produce a 
classified second vector space model; and 

determining a mean of vectors in each class in said classified second vector space model, 
wherein said mean comprises said centroid seeds. 

28. (Previously Presented) A program device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform said method in claim 26, said 
method further comprising: 

forming a second dictionary of most common words in said second dataset; 

generating a third vector space model by counting, for each word in said second 
dictionary, a number of said second documents in which said word occurs; and 

clustering said documents in said second dataset based on said third vector space model 
to produce a second dataset cluster. 

29. (Previously Presented) A program device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform said method in claim 28, wherein 
sa,d clustering of said second documents in said second dataset using said centroid seeds 
produces an adapted dataset cluster and said method further comprises: 

comparing classes in said adapted dataset cluster to classes in said second dataset cluster- 
and ' 

adding classes to said adapted dataset cluster based on said comparing. 
30. (Canceled). 
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